Supervised Categorization of JavaScript TM using Program Analysis Features

نویسندگان

  • Lu Wei
  • Bimlesh Wadhwa
چکیده

Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purposes can often help to interpret or provide crucial information about the web page. I have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. I then view understanding embedded scripts as a text categorization problem. I show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and program analysis to improve classification performance. I perform experiments on the standard WT10G web page corpus, and show that my techniques eliminate over 50% of errors over a standard text classification baseline. Subject Descriptors: H.3.3 Information Search and Retrieval F.3.2 Semantics of Programming Languages D.2.8 Metrics

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supervised Categorization of JavaScriptTM Using Program Analysis Features

Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts a...

متن کامل

Detection and Analysis of Shellcode in Malicious Documents

A Shellcode is a code snippet used as a payload in exploiting software vulnerability. In recent trends of attack, shellcode embedded in documents are one of the widely used vectors for targeted attacks. The significant aspect of these documents are dynamic content, URL access and can be camouflaged easily. Most of the security mechanisms are not accoutered to deal with these weaponised document...

متن کامل

The Reliability of Metrics Based on Graded Relevance

Improving weak ad-hoc retrieval by Web assistance and data fusion p. 17 Query expansion with the minimum relevance judgments p. 31 Improved concurrency control technique with lock-free querying for multi-dimensional index structure p. 43 A color-based image retrieval method using color distribution and common bitmap p. 56 A probabilistic model for music recommendation considering audio features...

متن کامل

Text Categorization using the Semi-Supervised Fuzzy c-Means Algorithm

Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. For the past few years, TC has become very important essentially in the Information Retrieval area, where information needs have tremendously increased with the rapid growth of textual information sources such as the Internet. In this paper, we compare , for text categoriz...

متن کامل

A Practical Blended Analysis for Dynamic Features in JavaScript

JavaScript is widely used in Web applications; however, its dynamism renders static analysis ineffective. Our JavaScript Blended Analysis Framework is designed to handle JavaScript dynamic features. It performs a flexible combined static/dynamic analysis. The blended analysis focuses static analysis on a dynamic calling structure collected at runtime in a lightweight manner, and refines the sta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005